class: center, middle, inverse, title-slide # Beyond significant ## A practical introduction to Bayesian data anlysis ### Andrew Ellis ### 2019-02-04 --- # Introduction - American Statistical Association (ASA) released a statement about p-values <a name=cite-lazarASAStatementPValues2016></a>([Lazar, 2016](https://doi.org/10.1080/00031305.2016.1154108)). Among the principles are: + P-values can indicate how incompatible the data are with a specified statistical model. + P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. - <a name=cite-greenlandStatisticalTestsValues2016></a>[Greenland, Senn, Rothman, Carlin, Poole, Goodman, and Altman (2016)](https://doi.org/10.1007/s10654-016-0149-3) provide a good discussion of common misinterpretations of p values and confidence intervals. + A confidence interval does not have a 95% chance of containing the true parameter. ??? the 95% refers only to how often 95% confidence intervals computed from very many studies would contain the true size if all the assumptions used to compute the intervals were correct. These further assumptions are summarized in what is called a prior distribution, and the resulting intervals are usually called Bayesian posterior (or credible) intervals to distinguish them from confidence intervals --- # The Bayesian New Statistics - <a name=cite-cummingNewStatisticsWhy2014></a>[Cumming (2014)](https://doi.org/10.1177/0956797613504966) claim that "we need to shift from reliance on NHST to estimation and other preferred techniques. - <a name=cite-kruschkeBayesianNewStatistics2018></a>[Kruschke and Liddell (2018)](https://doi.org/10.3758/s13423-016-1221-4) advocate that Bayesian methods are better suited to achieve this, for both hypothesis testing and parameter estimation. - According to <a name=cite-gigerenzerStatisticalRitualsReplication2018></a>[Gigerenzer (2018)](https://doi.org/10.1177/2515245918771329), we need to stop relying on NHST, but instead learn to use a statistical toolkit. - Many reviewers now demand Bayes factors, but Bayesian data analysis is not limited to calculating Bayes factors. ??? - Claim: Bayesian approaches are more direct, more intuitive and more informative than frequentist approaches. - Bayes factors are controversial --- # Bayesian methods .pull-left[ 🤗 - more intuitive (quantification of uncertainty) - able to provide evidence for/against hypotheses - more flexible + cognitive process models <a name=cite-leeBayesianCognitiveModeling2013></a>([Lee and Wagenmakers, 2013](#bib-leeBayesianCognitiveModeling2013)) + robust models - can include prior knowledge - better for multilevel models <a name=cite-gelmanDataAnalysisUsing2006></a>([Gelman and Hill, 2006](https://doi.org/10.1017/CBO9780511790942)) - based on probability theory (Bayes theorem) ] .pull-right[ 😧 - requires computing power - setting priors requires familiarity with probability distributions - (ongoing discussion about parameter estimation vs. hypothesis testing. See e.g. [here](https://statmodeling.stat.columbia.edu/2017/05/04/hypothesis-testing-hint-not-think/) and [here](https://statmodeling.stat.columbia.edu/2011/04/02/so-called_bayes/).) ] --- # Some theory We will have a brief look at the theoretical background, then dive straight into a practical example. .pull-left[ ## Key ideas: - Parameters are random variables<sup>1</sup>. These are drawn from probability distributions, which reflect our uncertainty about the parameters. -We update the prior distribution with the likelihood (data) to obtain a posterior distribution. ] .pull-right[ <!-- --> ] .footnote[ [1] In contrast to frequentist analysis, in which they are fixed. ] --- # Bayesian workflow  ??? - prior predictive distribution:the marginal distribution of the data over the prior --- # Bayesian workflow  ??? A posterior predictive p-value is a the tail posterior probability for a statistic generated from the model compared to the statistic observed in the data. --- # Bayesian model comparison  --- # A hands-on example To make this more concrete, we will apply this to the IQ example introduced previously. Summary: Two groups of people took an IQ test. Group 1 (N1=47) consumes a "smart drug", and Group 2 (N2=42) is a control group that consumes a placebo <a name=cite-kruschkeBayesianEstimationSupersedes2013></a>([Kruschke, 2013](https://doi.org/10.1037/a0029146)). --- <!-- --> --- # T-Test ```r Two Sample t-test data: IQ by Group * t = -1.5587, df = 87, p-value = 0.1227 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.544155 0.428653 sample estimates: mean in group Placebo mean in group SmartDrug 100.3571 101.9149 ``` ```r Welch Two Sample t-test data: IQ by Group * t = -1.6222, df = 63.039, p-value = 0.1098 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.4766863 0.3611848 sample estimates: mean in group Placebo mean in group SmartDrug 100.3571 101.9149 ``` --- # T-Test as linear model A t-test is really just a general linear model<sup>1</sup>: $$ Y = \alpha + \beta X + \epsilon$$ $$ \epsilon \sim N(0, \sigma^2) $$ where `\(X\)` is an indicator variable. .f-.footnote[ [1] This assumes equal variances. ] ```r fit_ols <- lm(IQ ~ Group, data = TwoGroupIQ) ``` which can be read as: $$ IQ = Placebo + \beta \cdot SmartDrug + \epsilon$$ $$ \epsilon \sim N(0, \sigma^2) $$ The `\(\beta\)` parameter therefore represents the difference between groups. --- ```r fit_ols <- lm(IQ ~ Group, data = TwoGroupIQ) Call: lm(formula = IQ ~ Group, data = TwoGroupIQ) Residuals: Min 1Q Median 3Q Max -19.9149 -0.9149 0.0851 1.0851 22.0851 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 100.3571 0.7263 138.184 <2e-16 *** * GroupSmartDrug 1.5578 0.9994 1.559 0.123 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.707 on 87 degrees of freedom Multiple R-squared: 0.02717, Adjusted R-squared: 0.01599 F-statistic: 2.43 on 1 and 87 DF, p-value: 0.1227 ``` --- # Generative model .pull-left[ - Linear regression model is as a probabilistic model - The graph on the right shows the dependencies among the random variables - `\(\alpha\)` is our expectation for the placebo group, `\(\beta\)` is our expectation for the difference in means. `\(\sigma\)` is the variance of the outcome. ] .pull-right[  ] --- Software - [JASP](https://jasp-stats.org) - [Stan](https://mc-stan.org) - [brms](https://github.com/paul-buerkner/brms): R package for Bayesian generalized multivariate non-linear multilevel models using Stan - [rstanarm](http://mc-stan.org/rstanarm/): Bayesian Applied Regression Modeling via Stan - [PyMC3](https://docs.pymc.io) - [Turing](http://turing.ml): a universal probabilistic programming language with an intuitive modelling interface, composable probabilistic inference, and computational scalability. --- # Inference ## Stan ``` data { int<lower=0> N; vector[N] x; vector[N] y; } parameters { real alpha; real beta; real<lower=0> sigma; } model { sigma ~ cauchy(0, 2.5); alpha ~ normal(100, 15); beta ~ normal(0, 10); y ~ normal(alpha + beta * x, sigma); } ``` --- # Inference ## brms ```r Family: gaussian Links: mu = identity; sigma = identity Formula: IQ ~ Group Data: TwoGroupIQ (Number of observations: 89) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 100.36 0.72 98.91 101.75 4083 1.00 * GroupSmartDrug 1.56 1.00 -0.37 3.56 4039 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 4.76 0.36 4.12 5.53 3504 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). ``` --- <!-- --> --- # Model checking <!-- --> --- # Model revision ``` ## Family: student ## Links: mu = identity; sigma = log; nu = identity ## Formula: IQ ~ 0 + Group ## sigma ~ Group ## Data: TwoGroupIQ (Number of observations: 89) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup samples = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat ## sigma_Intercept 0.01 0.19 -0.37 0.38 4343 1.00 ## GroupPlacebo 100.52 0.21 100.12 100.93 4312 1.00 ## GroupSmartDrug 101.55 0.36 100.85 102.24 4262 1.00 ## sigma_GroupSmartDrug 0.61 0.25 0.10 1.11 3800 1.00 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat ## nu 1.74 0.45 1.10 2.81 3554 1.00 ## ## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample ## is a crude measure of effective sample size, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` <!-- --> <!-- --> <!-- --> ``` ## # A tibble: 1 x 2 ## .variable mean ## <fct> <dbl> ## 1 b_GroupSmartDrug - b_GroupPlacebo NA ``` # Model checking <!-- --> --- # Model comparison In order to check whether including the grouping factor improves our model's predictive accuracy over a model in which we assume no group difference, we can perform a model comparison. --- ## LOO ``` ## LOOIC SE ## fit_eqvar 538.76 39.30 ## fit_robust 437.01 27.48 ## fit_robust_null 441.57 26.95 ## fit_eqvar - fit_robust 101.75 24.67 ## fit_eqvar - fit_robust_null 97.18 24.30 ## fit_robust - fit_robust_null -4.57 4.67 ``` --- ## Bayes factor For a more detailed description of Bayes factors, see [here](http://rpubs.com/awellis/bayes-factor). --- ``` ## [1] 2.935822 ``` ``` ## Iteration: 1 ## Iteration: 2 ## Iteration: 3 ## Iteration: 4 ## Iteration: 1 ## Iteration: 2 ## Iteration: 3 ## Iteration: 4 ## Iteration: 5 ## Iteration: 6 ``` ``` ## [1] 3.558422 ``` --- # Further reading - **Statistical Rethinking** (an introduction to applied Bayesian data analysis with lots of example code): [book website](https://xcelab.net/rm/statistical-rethinking/) and [lectures on Youtube](https://www.youtube.com/playlist?list=PLDcUM9US4XdNM4Edgs7weiyIguLSToZRI) - [brms](https://github.com/paul-buerkner/brms): R package for Bayesian generalized multivariate non-linear multilevel models using Stan - [Blog post by Jonas Lindeloev](https://rpubs.com/lindeloev/bayes_factors) on how to compute Bayes factors using various methods. - [Blog post by Matti Vuorre](https://vuorre.netlify.com/post/2017/01/02/how-to-compare-two-groups-with-robust-bayesian-estimation-using-r-stan-and-brms/#describing-the-models-underlying-the-t-tests) on how to perform a Bayesian t-test using brms - [Blog post by A. Solomon Kurz](https://solomonkurz.netlify.com/post/robust-linear-regression-with-the-robust-student-s-t-distribution/) on how to perform robust regression. - [Blog post by Andrew Heiss](https://www.andrewheiss.com/blog/2019/01/29/diff-means-half-dozen-ways/): Half a dozen frequentist and Bayesian ways to measure the difference in means in two groups. --- --- # References <a name=bib-cummingNewStatisticsWhy2014></a>[Cumming, G.](#cite-cummingNewStatisticsWhy2014) (2014). "The New Statistics: Why and How". En. In: _Psychological Science_ 25.1, pp. 7-29. ISSN: 0956-7976. DOI: [10.1177/0956797613504966](https://doi.org/10.1177%2F0956797613504966). <a name=bib-gelmanDataAnalysisUsing2006></a>[Gelman, A. and J. Hill](#cite-gelmanDataAnalysisUsing2006) (2006). _Data Analysis Using Regression and Multilevel/Hierarchical Models_. En. Cambridge University Press. ISBN: 978-0-511-79094-2. DOI: [10.1017/CBO9780511790942](https://doi.org/10.1017%2FCBO9780511790942). <a name=bib-gigerenzerStatisticalRitualsReplication2018></a>[Gigerenzer, G.](#cite-gigerenzerStatisticalRitualsReplication2018) (2018). "Statistical Rituals: The Replication Delusion and How We Got There". En. In: _Advances in Methods and Practices in Psychological Science_ 1.2, pp. 198-218. ISSN: 2515-2459. DOI: [10.1177/2515245918771329](https://doi.org/10.1177%2F2515245918771329). <a name=bib-greenlandStatisticalTestsValues2016></a>[Greenland, S, S. J. Senn, K. J. Rothman, et al.](#cite-greenlandStatisticalTestsValues2016) (2016). "Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations". En. In: _European Journal of Epidemiology_ 31.4, pp. 337-350. ISSN: 1573-7284. DOI: [10.1007/s10654-016-0149-3](https://doi.org/10.1007%2Fs10654-016-0149-3). <a name=bib-kruschkeBayesianEstimationSupersedes2013></a>[Kruschke, J. K.](#cite-kruschkeBayesianEstimationSupersedes2013) (2013). "Bayesian Estimation Supersedes the t Test." En. In: _Journal of Experimental Psychology: General_ 142.2, pp. 573-603. ISSN: 1939-2222, 0096-3445. DOI: [10.1037/a0029146](https://doi.org/10.1037%2Fa0029146). --- <a name=bib-kruschkeBayesianNewStatistics2018></a>[Kruschke, J. K. and T. M. Liddell](#cite-kruschkeBayesianNewStatistics2018) (2018). "The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective". En. In: _Psychonomic Bulletin & Review_ 25.1, pp. 178-206. ISSN: 1531-5320. DOI: [10.3758/s13423-016-1221-4](https://doi.org/10.3758%2Fs13423-016-1221-4). <a name=bib-lazarASAStatementPValues2016></a>[Lazar, N. A.](#cite-lazarASAStatementPValues2016) (2016). "The ASA's Statement on p-Values: Context, Process, and Purpose AU - Wasserstein, Ronald L." In: _The American Statistician_ 70.2, pp. 129-133. ISSN: 0003-1305. DOI: [10.1080/00031305.2016.1154108](https://doi.org/10.1080%2F00031305.2016.1154108). <a name=bib-leeBayesianCognitiveModeling2013></a>[Lee, M. D. and E. Wagenmakers](#cite-leeBayesianCognitiveModeling2013) (2013). _Bayesian Cognitive Modeling: A Practical Course_. OCLC: ocn861318341. Cambridge ; New York: Cambridge University Press. ISBN: 978-1-107-01845-7 978-1-107-60357-8.